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A METHOD FOR RECOVERING 3D SCENE STRUCTURE AND CAMERA 



MOTION FROM POINTS, LINES AND/OR DIRECTLY FROM THE IMAGE 



INTENSITIES 



5 RELATED APPLICATIONS 
^ ^The present appk(cation claims priority of provisional application 60/210,099 filed on June 7, 

^^ I \ 

2000. \ 

The present application \related to U.S. application Serial No. ,titled A Method for 

Recovering 3D Scene Strucrare and Camera Motion Directly fi'om Image Intensities, filed on 
10 , by the same inv^or as the present application, which related application is 

incorporated herein by reference. 
BACKGROUND OF THE INVENTION 
\. Field of the Invention 

The present invention is directed to a method for recovering 3D structure and camera motion, 
15 and more particularly to a linear algorithm for recovering the structure and motion data fi*om 
points, lines, and/or directly from the image intensities. 
2. Prior Art 

The science of rendering a 3D model fi-om information derived fi'om a 2D image predates 
computer graphics, having its roots in the fields of photogrammetry and computer vision. 

20 

Photogrammetry is based on the basic idea that when a picture is taken, the 3D world is projected 
in perspective onto a flat 2D image plane. As a result, a feature in the 2D image seen at a 
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particular point actually lies along a particular ray beginning at the camera and extending out to 
infinity. By viewing the same feature in two different photographs the actual location can be 
resolved by constraining the feature to lie on the intersection of two rays. This process is known 
as triangulation. Using this process, any point seen in at least two images can be located in 3D. 
5 It is also possible to solve for unknown camera positions as well with a sufficient number of 
points. The techniques of photgrammetry and triangulation were used in such applications as 
creating topographic maps fi"om aerial images. However the photogrammetry process is time 
intensive and inefficient. 

10 Computer vision techniques include recovering 3D scene structure fi^om stereo images, where 
correspondence between the two images is established automatically fi'om two images via an 
iterative algorithm, which searches for matches between points in order to reconstruct a 3D 
scene. It is also possible to solve for the camera position and motion using 3D scene structure 
from stereo images. 

15 

Current computer techniques are focused on motion-based reconstruction and are a natural 
application of computer technology to the problem of inferring 3D structure (geometry) fi*om 2D 
images. This is known as Structure-fi*om-Motion. Structure from motion (SFM), the problem of 
reconstructing an unknown 3D scene fi'om multiple 2D images, is one of the most studied 
20 problems in computer vision. 
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Most SFM algorithms that are currently known reconstruct the scene from previously computed 
feature correspondences, usually tracked points. Other algorithms are direct methods that 
reconstruct from the images intensities without a separate stage of correspondence computation. 
Previous direct methods were limited to a small number of images, required strong assumptions 
5 about the scene, usually planarity or employed iterative optimization and required a starting 
estimate. 



These approaches have complementary advantages and disadvantages. Usually some fraction 
of the image data is of siich low quality that it cannot be used to determine correspondence. 

10 Feature-based method address this problem by pre-selecting a few distinctive point or line 
features that are relatively easy to track, while direct methods attempt to compensate for the 
low quality of some of the data by exploiting the redundancy of the total data. Feature-based 
methods have the advantage that their input data is relatively reliable, but they neglect most 
of the available image information and only give sparse reconstructions of the 3D scene. 

15 Direct methods have the potential to give dense and accurate 3D reconstructions, due to their 
input data's redundancy, but they can be unduly affected by large errors in a fraction of the 
data. 

A method based on tracked lines is described in "A Linear Algorithm for Point and Line 
20 Based Structure from Motion", M. Spetsakis, CVGIP 56:2 230-241, 1992 , where the 

original linear algorithm for 13 lines in 3 images was presented. An optimization approach is 
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disclosed in C.J. Taylor, D. Kriegmarm, "Structure and Motion from Line Segements in 
MultipleImages,"PAMI 17:11 1021-1032,1995. Additionally, in "A unified factorization 
algorithm for points, line segments and planes with uncertainty models" K. Morris and 1. 
Kanade, ICCV 696-702, 1998, describes work on lines in an affine framework. A projective 
5 method for lines and points is described in "Factorization methods for projective structure 
and motion", B. Triggs, CVPR 845-851, 1996, which involves computing the projective 
depths from a small number of frames. "In Defense of the Eight-Point Algorithm: PAMI 19, 
580-593, 1995, Hartley presented a full perspective approach that reconstructs from points 
and lines tracked over three images. 

10 

The approach described in M. Irani, "Multi-Frame Optical Flow Estimation using Subspace 
Constraints," ICCV 626-633, 1999 reconstructs directly from the image intensities. The 
essential step of Irani for recovering correspondence is a multi-frame generalization of the 
optical-flow approach described in B. Lucas and T. Kanade, "An Iterative Image Registration 

15 Technique with an Application to Stereo Vision", IJCAI 674-679, 1981, which relies on a 
smoothness constraint and not on the rigidity constraint . Irani uses the factorization of D 
simply to fill out the entries of D that could not be computed initially. 

Irani writes the brightness constancy equation (7) in matrix form as A = -DI , where D 
tabulates the shifts d* and I contains the intensity gradients V / (pj. Irani notes that D has 

20 rank 6 (for a camera with known calibration), which implies that A must have rank 6. To 
reduce the effects of noise, Irani projects the observed A onto one of rank 6. Irani then 
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applies a multi-image form of the Lucas-Kanade approach to recovering optical flow which 
yields a matrix equation DI2 = - A2, where the entries of I2 are squared intensity gradients 
sunmied over the "smoothing" windows, and the entries of A2 have the form I3 AI. Due to 
the added Lucas-Kanade smoothing constraint, the shifts D or d'n can be computed as D ^-A2 
5 [Ij]"^ denotes the pseudo-inverse, except in smoothing windows where the image intensity is 
constant in at least one direction. Using the rank constraint on D, Irani determines additional 
entries of D for the windows where the intensity is constant in one direction. 

SUMMARY OF THE INVENTION 

10 

The present invention is directed to a method for recovering 3D scene structure and 
camera motion from image data obtained from a multi-image sequence, wherein a reference 
image of the sequence is taken by a camera at a reference perspective and one or more 
successive images of the sequence are taken at one or more successive different perspectives 
15 by translating and/or rotating the camera. The method comprising the steps of: 

(a) determining image data shifts for each successive image with respect to the 
reference image; the shifts being derived from the camera translation and/or rotation from the 
reference perspective to the successive different perspectives; 

(b) constructing a shift data matrix that incorporates the image data shifts for each 

20 image; 
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(c) calculating two rank-3 factor matrices from the shift data matrix using S VD, one 
rank-3 factor matrix corresponding the 3D structure and the other rank-3 factor matrix 
corresponding the camera motion; 

(d) recovering the 3D structure from the 3D structure matrix using SVD by solving a 
linear equation; and 

(e) recovering the camera motion from the camera motion matrix using the recovered 
3D structure. 

The method of the invention is a general motion algorithm wherein the camera positions for 
each successive perspective do not lie on a single plane. The method can reconstruct the 
image from points, lines or intensities. 

The present invention is an essentially linear algorithm that combines feature-based 
reconstruction and direct methods. It can use feature correspondences, if these are available, 
and/or the image intensities directly. Unlike optical-flow approaches such as that described 
in "Determining Optical Flow", B.K.P Horn and B.G. Schunck, AI 17, 185-203, 1981, the 
algorithm exploits the rigidity constraint and needs no smoothing constraint in principle 
assuming the motion is small and the brightness constancy equation is valid. However, in 
practice, it is sometimes necessary to impose smoothness in the algorithm. 



The method of the present invention can reconstruct from both point and line features. This 
is an important aspect of the present invention, since straight lines are abundant, especially in 
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human-made environments, and it is often possible to track both feature types. Lines can be 
locaHzed very accurately due to the possibility of finding many sample points on a single 
line, and they do not suffer from some of the problems of point tracking such as spurious T- 
junctions and the aperture effect. The method described here is the first that can reconstruct 
5 firom all three types of input data in any combination or separately over a sequence of 
arbitrarily many images under full perspective, 

BRIEF DESCRIPTION OF THE DRAWINGS 

10 These and other features, aspects, and advantages of the methods of the present invention will 
become better xmderstood with regard to the following description, appended claims, and 
accompanying drawings where: 

FIG, 1 schematically illustrates a hardware implementation of the present invention. 

15 

FIG. 2 is a block diagram that illustrates the method of the present invention. 
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

20 Definitions 

For purposes of this disclosure, the invention will be described in accordance with the 
following terms defined herein. The method of the present invention assumes that the 
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calibration is known and takes the focal length to be 1 (wlog). Modifications to the method 
for uncalibrated sequences will be discussed later. 

Assuming a sequence of images. Choose the zeroth image as the reference image. LetR*, 
T* denote the rotation and translation between this image and image /. Parameterize small 

0) = {o)'0'o)l) \t'] =(T'T'y 

5 rotations by the rotational velocity ^ ^ . Let ^ v ^ >'/ ^ 

Let a 3D point P transform as P - ^i^'^) . Assume Np points and Nl lines tracked over the 
sequence. For clarity, we use a different notation for the tracked points than for the pixel 
positions in the intensity images. Let q'^ = [q^ '^v)m ^^^^^e the m-th tracked point and 

A]^{a,p,Y)^ denote the /-th line in the /-th image. Let ^ ^ (^'^) . Let R*q denote the 

^.\[Rtj^{Rq^^ 



Define the 



image point obtained from q after a rotation 
three point rotational flows ''^'^(^'^)'''^'^(^'-v).'"^'^(^':>'). by 

L ^ ^ -I . denotes the 3D unit vector 

^'W . Similarly, ^=^^1^1 . Let ^^^^^'^^ denote the image 

displacement for the /w-th tracked point andg^ = {Q^,Qy,Q^ denote the 3D scene point 

corresponding to the tracked point q^ . Then parameterize a 3D line L by two places that it 
lies in. The first plane, described by A, passes through the center of the projection and the 
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imaged line in the reference image. The normal to the second plane, B, is defined by 
requiring^ *A = 0 and BQ = -l for any point Q on L. 

Let = {x^,y^ y be the image coordinates of the n-the pixel position. Order the pixels from 

I to A^^ , the total number of pixels. Let/' denote the i-the intensity image and let 

I I = r {p^ ) denote the image intensity at the n-the pixel position in /' . Let P„ denote the 
3D point imaged at p„ in the reference image, with ={X^,X^,XnY in the coordinate 

5 system of . Let denote the shift in image position from 7° to / of the 3D point . 

Suppose V is a set of quantities indexed by the integer a. We use the notation {v} to 
denote the vector with elements given by the F'' . 

10 Algorithm Description-Preliminary Processing 

The method of the present invention requires that the translational motion not be to large, 

(e.g., with |r/Z^iJ < 1/3 and that the camera positions do not lie in a plane. 

Given tracked points and lines, we approximately recover the rotations between the reference 

image and each following image by minimizing: 



15 




(1) 



m 



with respect to the rotations /?' , where one should adjust the constant according to the 



relative accuracy of the point and line measurements. Minimizing equation (1) gives 
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accurate rotation as long as the translations are not too large. After recovering the rotations, 

compensation is made for them and henceforth they can be treated as small. 

Points 

Point displacements are calculated in accordance with J. Oliensis, "A Multi-Frame Structure 
5 from Motion Algorithm under Perspective Projection", IJCV 34:2/3 1 63- 1 92, 1 999, 
incorporated herein by reference, the point displacements are 



« Qtm [qX - I )+ 0)^/%. ) + ) + o)lr^%^ ) where the last three terms give 




z,m z 



+ f\R\ gl^ ) where the rotational flow 




Under the assumptions, we can approximate 



10 the first-order rotational flow. Correspondingly, define the three length-2A^^ translational 



flow vectors 





rwi 
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and the length - 2N rotational flows 



f(?)}. 



-(0 



Then collect the shifts, 5-^, s'^^ into a A^- x IN^ matrix 5, where each row corresponds to a 
different image / and equals [{s'!^}^{'S'v}^]- Then 



5 « {r» {t; + {7; }a>: + K W + k + K W • 



(2) 



Lines 



The line measurement model is described as follows. One can reasonably assume that the 
noise of a measured line is proportional to the integrated image distance between the 
15 measured and true positions of the line. Let ©p^y denote the field of view (FOV) in radians. 
Typically, 0pov < 1 . If the measured line differs irom the true one by a z-rotation, this gives 
a noise of o((0pov / 2^ ). A rotation around an axis in the x-y plane gives a noise of 
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O(0pQy ), which is typically larger. (These estimates reflect the fact that the displacement 
region bounded by the true and measured lines look like two triangles in the first case and i 
quadrilateral in the second. For the representation described herein, this implies that line 



measurements determine more accurately than^^,^^ by a factor roughly of j/^ 



FOV 



The Image flow for lines can be described as follows. Let A = {A;0) 
and B = be the homogeneous length-4 vectors corresponding to the plane normals A 

and B for the Une L. Let Q = be the homogeneous vector corresponding to a point on 
L. Then A 'B = A Q = B'Q = 0, After rotation R and translation T, we determine the 
10 transformed A'B' fi*om the requirement that ^4' • = 5' • 0' = 0 . We can satisfy this 



requirement using A* - 



RA 
TA 



RB 
T'B + B, 



But A* doesn't necessarily have ^4=0, 



as the A' for the new image of the line must. So we set ^' = ^ -A^B / B^, which implies 



that the new image line is given hy A' = R 



A - B 



T 'A 



T ^B-hBj 



(3) 



Now consider the line flow SA^ = Aj - Af . Define SA^^ =A'-R *A' and 



15 SAr=R~'A' -A = - 



1-TB 



B so 6A = 6 At +5Ah . For small rotations and translations, with 



TB «1, 



(4) 
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B • A = 0 implies that SAA^O. To eliminate the scale ambiguity in defining each A, we 
take |A{*j = 1 in the reference image and require that the line flow satisfies 

0 = SAI ' A^ = [Aj -Af)* Aj exactly. We normalize all A^ to the same magnitude in the 
reference image to reflect the fact that the measurement errors should be similar for all lines. 
5 The requirement that 0 = • A^ "fixes the gauge" in a way that is consistent with our line 
representation and small-motion assumption. 

After "gauge fixing" each SA] incorporates just two degrees of fi-eedom. We represent SA] 
by its projection along two directions, A, x (z x ^^)and zxAj, which we refer to respectively 
as the upper and lower directions. Let the unit 3 -vector and project onto these two 
10 directions. For typical 0^^^, < l,\Ai 'z\«l and the upper component of A^ roughly equals 
Ai , . Thus, image measurements determine the upper component more accurately than the 
lower, by roughly l/Q^^y . In analogy to the point definitions, define the line translation 



flows 



MM 



where these are length- 2 A^^ vectors. By =B'Py and B^^BP^ are the upper and lower 
components of B. Similarly, define the rotational flows 
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{P,(yxA)} 

{P^-(zxA)}- 
{P,-(zxA)}_ 
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(5) 



Let A be theiV^ x matrix where each row corresponds to a different image / and equals 



jp^^-JA'f {Py-<5A'f . Then 



(6) 



Intensities 

One can apply the same arguments to the d)^ , the image shifts corresponding to the 3D point 
10 imaged at the pixel position p„ , as to the s'„, for the tracked feature points. Thus 

d; «z;'(px-[ri)+^.;r(')(pj+^y^)(pj+fi>:r(^^^ Let V/„ -V/(p„)=(/,,/,^ 

represent the gradient of the image intensity for , with some appropriate definition for a 
discrete grid of pixels. Let A/^ define the change in intensity with respect to the reference 



image. Its simplest definition is A/^ = . The brightness constancy equation is 



15 A/>V/ •d;,=0 

n n n 



(7) 



This and the previous equation imply that 

-M:= V/„ -d; « Z;'(v/„ •p„r - V/„|t'1)+ V/„ .(ty^ri') +<rf) +«;r^'). Define the three 
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length -A^^ translational flow vectors O/^ =-{z"7j,Oj,^ ^-{z"7j,cD;^ = {z~*(V/-p)), and 
the length - rotational flows, ^ -{v/ • v^'^ip)}, ^ -{v/ • r(')(p)), 4^^^ ^ -{v/ - r<'^(p)) . 

Let A hQSi NjxN^ matrix, where each row corresponds to an image / and equals {aI'J . 
Then A « {^Ml ^ i^M ^ • (8) 

5 

Reconstruction 

The reconstruction of the 3D scene is described as follows. Define the total rotational flow 
vectors for points, lines and image intensities by = ]» similarly for the 

y and z components. Here co^ and co^ are constant weights that the user should set according 

10 to the relative noisiness of the point, line, and intensity data. The have length 

2Np + 2N^ -^^x = ^tot. ' verify from the definitions that the W are computable 

from measured quantities and can be taken as known. Using Householder matrices, one can 
compute a {N^^^ - 3)x A^^^^ matrix H annihilating the three . Computing and 
multiplying by H cost O (N^^^ ) computation. Define the Nj x A^^^^ matrix 

15 D = [5 6>lA co^a]. Let C be a constant (A^/ -1)x(7V/ -l) matrix with C... =<J.. +1, we 
include C to counter the bias due to singling out the reference image for special treatment. 



Define D = C ^DH^ . Define the total translational flow vectors by = 



X 



, and 
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similarly for^- and z. From above, equations (2), (6) and (8) imply that 

Dc// « c'^' {t, }o ^h'' + c"^ {t^ plR" + c'^ {t; pin" . (9) 

D^// is approximately rank 3. 

Our basic algorithm of the method of the present invention is, 
5 1) Define H and compute D^^ . Using the singular value decomposition (SVD), 

compute the best rank-3 factorization of D^^^^ w M^^'^S^^^^ where M^^\ S^^^ are rank 3 
matrices corresponding respectively to the motion and structure. 

2) S^'^ satisfies, [o^ oj- H'^5^'^t/ + [t^ (10) 

where U and Q are unknown 3x3 matrices. We eliminate the unknowns , and Z"* 
10 from the in this system of equations to get 3N^^^ linear constraints on the 18 unknowns in 
U and Q . We solve these constraints with 0{n^^^) computations using the SVD. 

3) Given U and Q , we recover the structure unknowns , and Z"^ from 

[o^ ] = U'^S^^^U + [^^ ]q where Q is the 3d coordinate for an image pixel in the 

reference image, B is the shortest vector from the camera center to a 3D line and Z is the 
15 depth from the camera to a 3D scene alone the cameras optical axis. 

4) Given U , we use S^^'^U « [o^ o j and 

Dc// « C"^ {t; }WH^ + C"^ {t^ ^lU^ + C"^ {t; }0 [H^ to recover the translations. 

5) We recover the rotations o)'^ , 0)[, , 0)[ from 
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The description above omits some steps, which are important for bias correction. 1) The 
upper line components in D are weighted by a factor proportional to IC. , to account for 

/ ^FOV 

the greater accuracy in the measurement of these components. 2) To correct for bias due to 
the fact that the FOV is typically small, in Steps 2-4 we weight by a factor proportional to 

the FOV and weight by the inverse of this, while leaving the x and j; 

Components untouched. 3) The steps of the algorithm can be iterated to give better results 

and correct for the small motion assumptions. This involves multiplying the original feature 

point shifts by 1 -Z"*r, and the line shifts by 1 - B - T . The algorithm is guaranteed to 
converge to the correct reconstruction if the motion and noise are small and the camera 
positions do not lie on a plane. 

Of course, the results for the intensity-based part of our algorithm depend crucially on the 
technique used for computing derivatives. To make this more robust, one should iteratively 
reduce the size of the displacements d'„ by warping and recomputing the reconstruction in a 
coarse-to-fine mode. One implementation simply computes the image derivatives at a single 
scale, using consistency between the spatial derivatives computed for different images to 
determine whether the assumption of small image motion holds for this current scale. Only 
those pixel sites where the assumption does hold are used for the motion computation. 
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We have sometimes found it useful to preprocess the image intensities prior to running the 
algorithm. We first compute a Laplacian image and then transform the intensities by a 
stigma function to enhance edgy regions and suppress textureless ones. This gives the 
intensity image a form that is intermediate between the unprocessed images and a selected set 
5 of tracked features. 

Bas-relief for Lines 

Due to the bas-relief ambiguity, it is difficult to recover the constant component of the vector 
jZ"^ } for point features, though typically one can recover all other components accurately. A 




10 similar result can be derived for lines. The rotational flow for a line is x A . For typical 
®Fov « 1' I A' I « |A,v| • Thus X and y rotations give flows roughly proportional to 
j) X ^ - -A J and (x x ^) - -A^z . From equation (4) above, the effects of a z-translation 



constant component from rotational effects, which makes this component difficult to recover 
15 accurately. 

One can get a more quantitative analysis of this effect as follows. Assuming that the input 




are suppressed by U I , and the translational flows due to the constant component of {b^ } 



data consists of lines only, [o^ (D^ a)J= H^'^^'^fZ + f^, ^,]q implies that 
P^h[Ol^, <I)l,,,Oi^J = 0 where is a projection matrix that annihilates S^^K Those {b} 
components that produce small overlaps O^^H^P^HOl^ , will also produce small 



NECI1071 



18 g\nec\n96\13701\spec\13701pjl 




13701.pje 

overlaps ^L^^^^^^*^ ' follows from standard perturbation theory that noise will 
mix these components into the true solution for the B, where the amount of contamination is 
inversely proportional to the size of the eigenvalue. Define B = [{B^ }; {b^ \ {b^ }]. We 

computed for several sequences the eigenvalues of , defined such that 
5 B^H^B = ^^^IJi^^^La 5 found that, as expected, there was one small eigenvalue, 

whose eigenvector approximately equaled the constant component of {5.}. Thus one can use 
the input data to estimate the inaccuracy in recovering this component. 
General-Motion Algorithm for Intensities 
The General Motion Algorithm for just Intensities is as follows. 
10 0. First, assuming that the translations are zero, we recover the rotations and warp all 

images /*..../^'-* toward the reference image 7*^. In this case we then let the image 
displacements d^ refer to the unrotated images. 

-y T 

1 . We then compute H and A^^ = C AH , and using the singular value 
decomposition (SVD), compute the best rank-3 factorization of - A^.^ D^^f^ a M^^^S^^^^ , 
15 where M^^^ and S^^^ are rank 3 and correspond respectively to the motion and structure. 

2a. Further S^'^ satisfies, O = H^S^'^U + 



where U and Q are unknown 3x3 matrices. We eliminate the unknowns Z ' from the 
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in this system of equations to get ^ linear constraints on the 18 unknowns in U and Q . 
We solve these constraints with o{n ^ ) computations using the SVD. Then given U and 

Q , we recover the structure unknowns Z"^ from (5). 

2b. Then given U , we use S^^'^U « <D and equation (4) to recover the translations. 

5 

3 . The rotations W can then be recovered from c"^ W^ ^ = -C"^ (tO ^ + a) . 

Algorithm Analysis 
Step 0. 

10 Our neglect of the translations in the basic general-motion algorithm above produces 

small errors in recovering the rotations when the translations are moderate, with 

l^/Ll^^min ^1/4 as described in "Rigorous Bounds for Two-Frame Structure from Motion," 

J. Oliensis lUW, 1225-1231, 1994. We describe two multi-frame techniques for recovering 

-y ~y 

the rotations. Neglecting the translations in (4) gives - C A « C W^ , which can be 
15 solved for W. This technique resembles the projective algorithm of L. Zelnik-Manor and M. 
Irani, described in "Multi-Frame AHgnment of Planes", CVPR 1:151-156, 1999 for 

-1/ 

reconstructing planar scenes, except that our use of C reduces the bias due to 
overemphasizing the noise in the reference image. As usual, one can extend the technique to 
larger rotations by iteratively warping and re-solving for W. 
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The technique just described is best for small rotations, since it uses the first-order 
rotational displacements. Another approach begins with the extension to planar scenes as 
described in "Structure from Motion from Points, Lines and Intensities, J. Oliensis and M. 
Werman CVPR 2000 of the multi-frame Sturm-Triggs technique set forth in "A factorization 
5 based algorithm for multi-image projective structure and motion", P. Sturm and B. Triggs, 
ECCV 709-720, 1996. This multi-frame approach recovers homographies of arbitrary size 
between the reference image and the subsequent images. One can recover the rotations by 
choosing the orthogonal, positive-determinant matrices closest to the recovered 
homographies. 
10 Step 2a. 

From (4), the columns of S^^^ generate approximately the same subspace as that 
generated by the , , <I) , : 

S^'^U « HO) 

where U is a 3 x 3 matrix. (7) Above gives an overconstrained system of linear equations 
15 that one can solve directly for U and . However this is an o{n^^ ) computation. For 



20 ^-We eliminate the Z;' from (8)\bove and solve directly for U and Q . Denote the 



large images, we use instead the following OyN^ ) technique. 



First we define P = H^H , a projection matrix. Multiplying (7) by gives 



H^S^^^U « pep , which implies 



Oj^H'^S^'^ + ^Q 
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columns of Uhy U = [t/, k/j ^ ]> similarly for D . Let = s'^U^ and Q3 = s'^Q^ , 
where the scale s equals tha average distance of the image points from the image center. We 
include s to reduce the bias ^f the algorithm. From the definitions of 3> ^ , , O , , (8) above 

implies /.Jh^sWc/, +^f^lW.„[H^s(^)t/, +4^0 /^Jh^S^^^C/; +^Q;i 
5 -^"*(V/„ 'p^H^S^^^U^ -^"Va^l ^I^,^[h'^S^^^U!,+'¥Q\\ and a similar equation 

with {x<r^ y). Step 2 A of our algorithm solves this system of linear equations for Q and U 
in the least-squares sense and thfen recovers the Z„'* from these solutions and (8) above. The 
computation is o{Np), Note also that step 2a bases its computations on V/ . It we use the 

value of V/ computed from the measured reference image/*^ ,then the estimates of U, Q, Z^^ 
10 will not be true multi-frame estimates. To get a better multi-frame estimate, one can first re- 
compute and V/ based on all tlife image data and use the result to compute U, Q via (9) 
as follows. \ 

Let A% be the best rank 3 approximation to A^^ = C ^AH^ . Up to unknown 
rotational flows, A^c^K gives the intensity shifts due to the translational displacements: 

15 C'^A^A^^J.H + WY'' 

We then solve (10) in the least-squares sense for W, which gives improved estimates 
A^ for A and for . These can then be used in (9). However, this computation of 
gives o[nj^o)^ , Nfr^Z'^ ) errors. If the original noise in is less than this, one should use 
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directly in step 2a. 

Step 2b. 

We haveC'^T « M^^^C/y, , where Uj is also a 3 x 3 matrix. The 
algorithm recovers Uj and T by solving the linear system - A^// - M^^^C/y^O^H^, 
5 for Uj- 5 using the O computed previously, and then plugging in the result to recover T . 

General Notes 

Experimentally, we find that the intensity pattern varies significantly fi-om image to 
image in our sequences. We actually apply our approach to a modified version of the original 
10 sequence, obtained by: 

1) Filtering the sequence with a laplacian of gaussian to emphasize edges; 

2) Applying a nonlinear transformation (a sigma fiinction) to further emphasize the 
high-frequency features. The new sequences give a continuous representation of the 
interest features, which can be used to compute derivatives. This procedure 

15 emphasizes the discrete, more easily trackable features, but does not eliminate the 

information from other image regions. One could extend this approach to deal with a 
variety of interest features, selecting among them to determine the derivatives that are 
likely to give the best flow. To ensure that the BCE is approximately valid, we also 
require that the spatial derivatives be consistent over the whole sequence, and that the 

20 image intensity be well approximated by a plane at the scale at which we are 
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working. In our current implementation, we apply the algorithm at a single scale. 
In another embodiment of the present invention, the algorithm described above can also 
handle: cameras with changing and unknown focal lengths, where the calibration is otherwise 
fixed and known and camera calibrations that change arbitrarily from image to image in an 
5 unknown fashion (this is know as the projective case). 

For varying, unknown focal lengths, one can modify the algorithm along the line of the 
method described in "A Multi-frame Structure from Motion Algorithm under Perspective 
Projection" IJCV 34:2/3, 163-192, 199 and Workshop on Visual Scenes 77-84, 1999. The 
modification is described as follows, in Step 2a, we define H to annihilate = {p^ • ^l(Pn )} 
10 addition to ^^^^y^^z - recovering the Z„"' one must replace ^in (8) by [^,^^], and 

Q becomes a 4x 3 matrix. The modified Step 2a can recover the Z"' except for the constant 
(Z„ = l) component, which is intrinsically difficult to recover accurately when the focal lengths 
are unknown and varying. 

For the projective case, the algorithm can be modified along the lines of the method 
15 described in "Fast Algorithms for Projective Multi-Frame Structure from Motion", J. Oliensis 
and Y. Gene, ICCV 536-543, 1999. In this case, we define H to annihilate eight length- A^^ 
vectors, where the «-th entries of these vectors are given by the eight quantities 
PniC^^nX' P«rf(^^JPn'for fl, 6, c, £ {x, . Stcp 2 a must bc modified by replacing 
^ in (8) by a matrix consisting of these eight vectors. 

20 
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Projective Algorithm 

The algorithm described here for calibrated sequences generalizes in a straightforward way to 
deal with uncalibrated sequences. In this projective case, instead of a preliminary stage of 
rotation computation, one computes planar homographies between the reference image and 
5 each subsequent image. It is worth noting that one can easily and accurately compute these 
homographies by a multi-frame generalization of the projective-reconstruction method. 
We briefly describe how this works for the example of tracked points. Assume the scene is 
planar. Then ;Lt,q: =M'S^ (11) 



10 where M' is a 3x3 matrix (a homography), is the structure 3 -vector giving the position 
of the m-th point on the plane (in some arbitrary coordinate system), and X^^ is the projective 
depth. The steps of the homography recovery are: 

1) Take =1 initially. 

2) For the current estimate of the , collect the >i^q^^ into a single 3A^/ x 

15 matrix F . Use the SVD to decompose this into the product of two rank 3 factors: 

3) For the current M^^\ 5^^^ , compute the minimizing the |r -M(^^5^^^| . Return 
to Step 2. 

After convergence, M^^^ contains the M' estimates. Our estimate of the homography taking 
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the reference image to image / is M ^ \ M' . This procedure is guaranteed to converge to the 



p(4)P. / 

local minimum of '/ 2 ? where T^"^^ is the difference between F and its best rank-3 




approximation. Also, it gives nearly the optimal estimates of the M' if the true motion is not 
too large and the scene is truly planar. 

5 

The uncalibrated version of the line-based algorithm is described as follows. Let K' be the 
standard 3x3 matrix, with zero lower-diagonal entries, giving the calibration parameters for 
the /-th image. For the uncalibrated case, equation (3) still holds if one replaces 

R -> {K'Y^RK = and T ^ iTT , where K and K' are the calibration matrices for the 
10 first and second image. Af„ is a general homography and is the inverse transpose of the 
analogous homography for points. The first order approximation of the new version of 



A' = R 



A-B- 



is M.' «(T A)B+fi;xA,butwith T->AT and with 



TB + B, 

0)xA replaced by SM„A , where M ^ = 1 + SM„ . 

We define H to annihilate the first-order homography flow due to SM^ , instead of the 
15 rotational flows as for the calibrated case. Since the first-order approximation of M]/ is 
1 - SM I, , one can easily define H to annihilate the first-order flows for both points and lines. 



Represent = 



FG' 



, where F is 2x2 and G and 7 are 2x1. The first-order point and 
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line displacements due to each of the 8 parameters in dM are given in the following table: 





F... 




^2, 




G, 


G2 


J. 


h 


Ax: 


Ax 


Ay 


0 


0 


Az 


0 


0 


0 


Ay: 


0 


0 


Ax 


Ay 


0 


Az 


0 


0 


Az: 


0 


0 


0 


0 


0 


0 


Ax 




<lX: 


-X 


0 


-y 


0 


-x2 


-xy 


-1 


0 


qy. 


0 


-X 


0 


-y 


-xy 




0 


-1 



We define H to annihilate the 8 vectors associated with the columns of this table. After the 
5 indicated replacements, the uncalibrated algorithm is the same as the calibrated one. 

If one fixes the point coordinates in some reference image, the remaining freedom under a 

projective transform is Z"* — > ax + 6y + c + dZ~^ where x and y are the image coordinates. 

This corresponds to scaling and adding an arbitrary plane to the structure. 

One can derive a similar relation for B. Let P be a 3D point on the line determined by B. 

10 Then B • P =- 1 implies B • (x, y,\) = -Z"^ . A projective transform must preserve B' • P' = - 1 . 

From this, and the transform of Z"' given above, it follows that B'-B = (a,6,c) up to a 
scaling, with the same constants a, b, c as for the point transform. This relation holds with 
the same constants for any line B. 

15 The fact that B is recoverable only up to an overall shift is also clear from the uncalibrated 
version of the algorithm. When we use H to eliminate the first-order effects of small 
homographies, we also eliminate the translational image flows due to the three constant 
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components of the B. Thus, these components cannot be recovered. 
Implementation 

5 It will be apparent to those skilled in the art that the methods of the present invention disclosed 
herein may be embodied and performed completely by software contained in an appropriate 
storage medium for controlling a computer. 

Referring to Fig. 1, which illustrates in block-diagram form a computer hardware system 
10 incorporating the invention. As indicated therein, the system includes a video source 101 , 
whose output is digitized into a pixel map by a digitizer 1 02. The digitized video frames are then 
sent in electronic form via a system bus 103 to a storage device 104 for access by the main 
system memory during usage. During usage the operation of the system is controlled by a 
central-processing unit, (CPU) 105 which controls the access to the digitized pixel map and the 
1 5 invention. The computer hardware system will include those standard components well-known 
to those skilled in the art for accessing and displaying data and graphics, such as a monitor, 106 
and graphics board 107. 

The user interacts with the system by way of a keyboard 108 and or a mouse 109 or other 
20 position-sensing device such as a track ball, which can be used to select items on the screen or 
direct functions of the system. 
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The execution of the key tasks associated with the present invention is directed by instructions 
stored in the main memory of the system, which is controlled by the CPU. The CPU can access 
the main memory and perform the steps necessary to carry out the method of the present 
5 invention in accordance with instructions stored that govern CPU operation. Specifically, the 
CPU, in accordance with the input of a user will access the stored digitized video and in 
accordance with the instructions embodied in the present invention will analyze the selected 
video images in order to extract the 3D structure and camera motion information fi-om the 
associated digitized pixel maps. 

10 

Referring now to Fig. 2 the method of the present invention will be described in relation to 
the block diagram. A reference image of the sequence is taken by a camera at a reference 
perspective and one or more successive images of the sequence are taken at one or more 
successive different perspectives by translating and/or rotating the camera in step 201 . The 

15 images are then digitized 202 for analysis of the 3D image content, i.e. points, lines and 
image intensities. Then image data shifts for each successive image with respect to the 
reference image are determined 203; the shifts being derived from the camera translation 
and/or rotation from the reference perspective to the successive different perspectives. 
A shift data matrix that incorporates the image data shifts for each image is then constructed 

20 204 and two rank-3 factor matrices from the shift data matrix using SVD, one rank-3 factor 
matrix corresponding the 3D structure and the other rank-3 factor matrix corresponding the 
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camera motion are calculated 205. Recovering the 3D structure from the 3D structure matrix 
using SVD by solving a linear equation 206. The camera motion can then be recovered using 
the recovered 3D structure 207. 

While there has been shown and described what is considered to be preferred 
embodiments of the invention, it will, of course, be understood that various modifications and 
changes in the form or detail could readily be made without departing from the spirit of the 
invention. It is therefore intended that the invention be not limited to the exact forms described 
and illustrated, but should be constructed to cover all modifications that may fall within the 
scope of the appended claims. 
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